--- title: Title keywords: fastai sidebar: home_sidebar ---
{% raw %}
{% endraw %}

The first thing we need to do is download the data.

First of all, let's install the kaggle api and configure it with API credentials.

Let's create a folder to download the competition files.

{% raw %}
mkdir data
{% endraw %}

Let's download the files now (make sure you accept competition rules on Kaggle first or else you will not be able to access the data!).

{% raw %}
!kaggle competitions download birdsong-recognition -p data
Downloading birdsong-recognition.zip to data
100%|█████████████████████████████████████▉| 22.1G/22.1G [06:45<00:00, 83.2MB/s]
100%|██████████████████████████████████████| 22.1G/22.1G [06:45<00:00, 58.5MB/s]
{% endraw %} {% raw %}
!ls data
birdsong-recognition.zip
{% endraw %} {% raw %}
%%capture
!unzip data/birdsong-recognition.zip -d data
{% endraw %} {% raw %}
!ls data
birdsong-recognition.zip	 example_test_audio_summary.csv  train_audio
example_test_audio		 sample_submission.csv		 train.csv
example_test_audio_metadata.csv  test.csv
{% endraw %}

Let's take a look at train.csv.

{% raw %}
import pandas as pd

train = pd.read_csv('data/train.csv')
train.shape
(21375, 35)
{% endraw %} {% raw %}
train.head()
rating playback_used ebird_code channels date pitch duration filename speed species ... xc_id url country author primary_label longitude length time recordist license
0 3.5 no aldfly 1 (mono) 2013-05-25 Not specified 25 XC134874.mp3 Not specified Alder Flycatcher ... 134874 https://www.xeno-canto.org/134874 United States Jonathon Jongsma Empidonax alnorum_Alder Flycatcher -92.962 Not specified 8:00 Jonathon Jongsma Creative Commons Attribution-ShareAlike 3.0
1 4.0 no aldfly 2 (stereo) 2013-05-27 both 36 XC135454.mp3 both Alder Flycatcher ... 135454 https://www.xeno-canto.org/135454 United States Mike Nelson Empidonax alnorum_Alder Flycatcher -82.1106 0-3(s) 08:30 Mike Nelson Creative Commons Attribution-NonCommercial-Sha...
2 4.0 no aldfly 2 (stereo) 2013-05-27 both 39 XC135455.mp3 both Alder Flycatcher ... 135455 https://www.xeno-canto.org/135455 United States Mike Nelson Empidonax alnorum_Alder Flycatcher -82.1106 0-3(s) 08:30 Mike Nelson Creative Commons Attribution-NonCommercial-Sha...
3 3.5 no aldfly 2 (stereo) 2013-05-27 both 33 XC135456.mp3 both Alder Flycatcher ... 135456 https://www.xeno-canto.org/135456 United States Mike Nelson Empidonax alnorum_Alder Flycatcher -82.1106 0-3(s) 08:30 Mike Nelson Creative Commons Attribution-NonCommercial-Sha...
4 4.0 no aldfly 2 (stereo) 2013-05-27 both 36 XC135457.mp3 level Alder Flycatcher ... 135457 https://www.xeno-canto.org/135457 United States Mike Nelson Empidonax alnorum_Alder Flycatcher -82.1106 0-3(s) 08:30 Mike Nelson Creative Commons Attribution-NonCommercial-Sha...

5 rows × 35 columns

{% endraw %}

We can listen to the longest recording in the train set:

{% raw %}
from IPython.lib.display import Audio

Audio('data/train_audio/comrav/XC246425.mp3')
{% endraw %}

How many different birds are there in the train set?

{% raw %}
train.ebird_code.nunique()
264
{% endraw %}

This is what sampling rates across the recordings look like:

{% raw %}
train.sampling_rate.value_counts()
44100 (Hz)    12693
48000 (Hz)     8373
22050 (Hz)      123
32000 (Hz)       93
24000 (Hz)       54
16000 (Hz)       34
11025 (Hz)        3
8000 (Hz)         2
Name: sampling_rate, dtype: int64
{% endraw %}

While all the informaion in train.csv might be useful, we have to focus first of all on the fundamentals. We know we will be tasked with predicting the ebird_code. Also, working with mutliple recordings per class with different sampling rates is unwieldy - let us make the life easier for us.

Let us resample all the files to 48 kHz and combine them to have a single file per class. And let us do it all in Python.

!!! WARNING !!!

The code below was run on a 96 vCpu vm - this might take a really long time on a standard machine. To save you the hassle I uploaded the data post transformation onto GCP storage here. The files are zipped so you will have to extract them after download completes.

{% raw %}
{% endraw %} {% raw %}
mkdir data/train
{% endraw %} {% raw %}
{% endraw %}

In the course of processing the files I found that I cannot read data/train_audio/lotduc/XC195038.mp3. Removing it to not get an error.

{% raw %}
Path('data/train_audio/lotduc/XC195038.mp3').unlink()
{% endraw %} {% raw %}
def read_audio(path): return librosa.load(path, sr=SAMPLE_RATE, mono=True)[0]
{% endraw %} {% raw %}
%%time
for directory in Path('data/train_audio').iterdir():
    ebird_code = directory.name
    file_paths = list(directory.iterdir())
    with Pool(NUM_WORKERS // 2) as p:
        xs = p.map(read_audio, file_paths)
    x_out = np.concatenate(xs)
    sf.write(f'data/train/{ebird_code}.wav', x_out, SAMPLE_RATE)
CPU times: user 4min 10s, sys: 10min 14s, total: 14min 24s
Wall time: 1h 38min 50s
{% endraw %}

Ok, so we now have all the audio saved in a convenient format and we offloaded the expensive resampling to happen before training. Great!

But how long are the files? Or in other words, how much data do we have?

{% raw %}
ebird_codes = []
durations = []
paths = []
for path in Path('data/train').iterdir():
    ebird_codes.append(path.stem)
    durations.append(sf.info(path).duration)
    paths.append(path)
{% endraw %} {% raw %}
df_len = pd.DataFrame(data={'ebird': ebird_codes, 'path': paths, 'duration': durations})
{% endraw %} {% raw %}
df_len.iloc[df_len.duration.argmin()]
ebird                      redhea
path        data/train/redhea.wav
duration                  392.605
Name: 180, dtype: object
{% endraw %} {% raw %}
df_len.iloc[df_len.duration.argmax()]
ebird                      bulori
path        data/train/bulori.wav
duration                  13776.8
Name: 70, dtype: object
{% endraw %} {% raw %}
df_len.to_pickle('data/ebird_path_duration.pkl') # saving the dataframe in case we might have
                                                 # a use for it down the road
{% endraw %}

We only have just above 6 minutes of audio for redhea and the best represented species is bulori with over 3.5 hrs of recordings!

This information is useful as it will inform how we can construct our dataset.

In fact, let's create a simple dataset already. We will use it to train our first model on raw audio (vs using spectrograms or some other representation).

Let's begin with making our examples 5 second long. We can stick to fastai parlance and first of all establish a way of creating individual items, or in other words, train examples.

{% raw %}
{% endraw %} {% raw %}

get_items[source]

get_items(n_per_class=1000)

{% endraw %}

A thousand examples per class, that should be enough to get started!

Now without a doubt, creating the train set will be one of the most important ingredients to doing well on this competition. Can you think of a way how the method we are using right now can be improved?

Either way, for now the goal is the complete a first pass through the pipeline, from reading in data to training our model to making predictions.

Let's construct a simple dataset to read in the items.

{% raw %}
from torch.utils.data import Dataset

class AudioDataset(Dataset):
    def __init__(self, items, classes):
        self.items = items
        self.vocab = classes
    def __getitem__(self, idx):
        cls, path, offset = self.items[idx]
        x, _ = sf.read(path, SAMPLE_RATE*5, start=offset)
        return x, self.vocab.index(cls)
    def __len__(self):
        return len(self.items)
{% endraw %}

We have a basic dataset class, let us figure out a way how to split our items into a train and validation set and do so in a reasonable way. One reasonable way is to stratify the splits by class. Given how our dataset is not imbalanaced - we have 1000 examples per category, this is not necessary and we should be good with random sampling.

Nonetheless, let's do it the proper way - it is best to subscribe to YAGNI when writing code, but here I have a strong feeling this functionality can be useful down the road.

{% raw %}
{% endraw %} {% raw %}

trn_val_split_items[source]

trn_val_split_items(items, n_splits=5)

{% endraw %} {% raw %}
trn_idxs, val_idxs = trn_val_split_items(items, 10)[0]
{% endraw %} {% raw %}
trn_idxs.shape, val_idxs.shape
((211200,), (52800,))
{% endraw %}

Sweet! Time to instantiate our datasets!

{% raw %}
trn_ds = AudioDataset(items[trn_idxs], classes)
val_ds = AudioDataset(items[val_idxs], classes)
{% endraw %} {% raw %}
len(trn_ds), len(val_ds)
(211200, 52800)
{% endraw %} {% raw %}
trn_ds[0][0].mean()
-8.646138509114583e-06
{% endraw %} {% raw %}
trn_ds[0][0].std()
0.027123435715929924
{% endraw %}

The standard deviation might make it hard for our network to learn. Also, let's ensure our data is indeed zero centered.

Our AudioDatset class might need some changes.

{% raw %}
{% endraw %} {% raw %}

calculate_mean_and_std[source]

calculate_mean_and_std(items, trn_idxs)

{% endraw %} {% raw %}

class AudioDataset[source]

AudioDataset(items, classes, mean=None, std=None) :: Dataset

An abstract class representing a :class:Dataset.

All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite :meth:__getitem__, supporting fetching a data sample for a given key. Subclasses could also optionally overwrite :meth:__len__, which is expected to return the size of the dataset by many :class:~torch.utils.data.Sampler implementations and the default options of :class:~torch.utils.data.DataLoader.

.. note:: :class:~torch.utils.data.DataLoader by default constructs a index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.

{% endraw %} {% raw %}
mean, std = calculate_mean_and_std(items, trn_idxs)
{% endraw %} {% raw %}
trn_ds = AudioDataset(items[trn_idxs], classes, mean, std)
val_ds = AudioDataset(items[val_idxs], classes, mean, std)
{% endraw %} {% raw %}
trn_ds[0][0].mean(), trn_ds[0][0].std()
(0.00068348495, 0.43455595)
{% endraw %}

This should be much better!

We have our data, we have a basic dataset, we achieved everything we wanted here. Time to start training!